Computational extraction of lexico-grammatical information for generation of Swedish intonation

نویسندگان

  • Merle Horne
  • Marcus Filipsson
چکیده

This article presents a discussion of a number of algorithms being developed which will enable the generation of prosodic structure for Swedish restricted texts. These algorithms, including a word-class tagger, a complex-word identifier and a prosodic parser form part of a linguistic preprocessor to a text-to-speech system for generation of intonation. PROSODIC STRUCTURE One of the goals of current research in text-to-speech systems is to improve the quality of intonation by developing algorithms for preprocessing texts in order to extract grammatical and discourse information necessary for the generation of appropriate prosodic patterns. The present article will describe the work we are currently carrying out aimed at using the information on coreferentiality obtained from the referent tracking algorithm previously developed (see Horne et al. 1993) together with further information on lexico-syntactic category designation to group words together into a hierarchy of prosodic constituents. Whereas the referent-tracking process is important to the Fo-generating component in order to be able to predict the distribution of focal and non-focal accents, information on prosodic structure is needed in order to better predict the location as well as the particular form of tone accents associated with utterance-internal prosodic boundaries. Following an approach similar to Bachenko & Fitzpatrick (1990), Quené & Kager (1993) and inspired by concepts within prosodic phonology (e.g. Nespor & Vogel 1986), we are attempting to determine how one, using a minimal amount of parsing, can obtain enough information to construct a hierarchical prosodic structure for each sentence in a text. Unlike other researchers, however, we are also using contextual information such as coreference in our approach to generating prosodic structure. At least three levels of prosodic structure are required for Swedish in order to model all the prosodic information observed in our data (Horne 1994). The smallest of these is the Prosodic Word which we will define as corresponding to a content word and any following function words up to the next content word within a given clause. At the beginning of a clause, the Prosodic Word can also begin with one or more function words. The Prosodic Word is characterized by a word accent and potentially a focal accent (Accent 1= HL*(H ̄L ̄), Accent 2 = H*L(H ̄L ̄) (We use H ̄ and L ̄ to represent respectively a focal high and the low tone accent following a focal high in order to distinguish them from the H and L associated with the word accents.). It is also marked by a boundary tone which is realized by a final rise in the case where the content word is not focussed (i.e. contextually given) (H#) or a fall when the content word is focussed (L#). These boundary tones, we claim, play an important role in creating the transitions between consecutive Prosodic Words in a larger Prosodic Phrase. They are also points for potential pauses, e.g. before focussed content words (see Gårding 1967, Strangert 1993). The unit does not necessarily correspond to a syntactic constituent as the example in (1) illustrates (‘–’ represents the boundary between Prosodic Words). This type of ‘nonsyntactic’ grouping is perhaps more characteristic of well-planned read texts or spontaneous speech than of non well-planned texts read e.g. by a non-expert/non-professional. (1) Kurserna på – Stockholmsbörsen – fortsätter att – falla. Rates(det) on – Stockholm Stock Exchange(det) – continue to – fall ‘Rates on Stockholm’s Stock Exchange continue to fall’ One or more Prosodic Words make up a Prosodic Phrase which is marked by a final L% or H% boundary tone accent. Factors which determine the location of Prosodic Phrase boundaries include the following: a) sentence boundary: A sentence boundary corresponds to the end of a Prosodic Phrase, b) new/given distinction: A Prosodic Phrase must contain at least one focussed Prosodic Word, c) length: A Prosodic Phrase will not exceed x syllables at a given rate of speech y. Finally, one or more Prosodic Phrases make up a Prosodic Utterance, which is bounded by pauses. It is further generally assumed that each prosodic constituent is characterized by a certain amount of preboundary lengthening (Gussenhoven & Rietveld 1992, Wightman et al. 1992), and although we have not as yet made any detailed investigations of the phenomenon in our data which would allow us to quantify a lengthening index, we are assuming that, all other things being equal, the higher up in the hierarchy a prosodic constituent is placed, the greater the relative duration associated with its final syllable(s) will be. Figure 1 presents in schematic form the prosodic constituents assumed for Swedish and their phonetic correlates. The tone accents (H and L) are assumed to be associated with syllables (S) according to principles outlined in Bruce (1977). It is also assumed that the realization of the tone accents is dependent to some extent on the number of syllables present in a particular word, i.e. the number of syllables in a given word dictates to a great extent how many tones will be realized phonetically. PW PW PW PW PPh PPh PU Content word (function word) 0 S S S S S

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information extraction and text generation of news reports for a Swedish-English bilingual spoken dialogue system

This paper describes an experimental dialog system designed to retrieve information and generate summaries of internet news reports related to user queries in Swedish and English. The extraction component is based on parsing and on matching the parsing output against stereotypic event templates. Bilingual text generation is accomplished by filling the templates after which grammar components ge...

متن کامل

The Interface between Linguistic and Pragmatic Competence: The Case of Disagreement, Scolding, Requests, and Complaints

 Second language  learners often develop grammatical competence in the absence of concomitant pragmatic competence (Kasper & Roever, 2005) and the exact nature of the relationship between the two competences is still indistinct and in need of inquiries ( Bardovi-Harlig, 1999; Khatib & Ahmadisafa, 2011). This study is a partial attempt to address the lacuna and aims to see if any relationship ca...

متن کامل

A multilingual FrameNet-based grammar and lexicon for controlled natural language

Berkeley FrameNet is a lexico-semantic resource for English based on the theory of frame semantics. It has been exploited in a range of natural language processing applications and has inspired the development of framenets for many languages. We present a methodological approach to the extraction and generation of a computational multilingual FrameNet-based grammar and lexicon. The approach lev...

متن کامل

Generation of Single-sentence Paraphrases from Predicate/Argument Structure using Lexico-grammatical Resources

Paraphrases, which stem from the variety of lexical and grammatical means of expressing meaning available in a language, pose challenges for a sentence generation system. In this paper, we discuss the generation of paraphrases from predicate/argument structure using a simple, uniform generation methodology. Central to our approach are lexico-grammatical resources which pair elementary semantic ...

متن کامل

Automatic Question Generation from Swedish Documents as a Tool for Information Extraction

An implementation of automatic question generation (QG) from raw Swedish text is presented. QG is here chosen as an alternative to natural query systems where any query can be posed and no indication is given of whether the current text database includes the information sought for. The program builds on parsing with grammatical functions from which corresponding questions are generated and it i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994